31 research outputs found

    Exploring synergetic effects of dimensionality reduction and resampling tools on hyperspectral imagery data classification

    Get PDF
    The present paper addresses the problem of the classification of hyperspectral images with multiple imbalanced classes and very high dimensionality. Class imbalance is handled by resampling the data set, whereas PCA and a supervised filter are applied to reduce the number of spectral bands. This is a preliminary study that pursues to investigate the benefits of combining several techniques to tackle the imbalance and the high dimensionality problems, and also to evaluate the order of application that leads to the best classification performance. Experimental results demonstrate the significance of using together these two preprocessing tools to improve the performance of hyperspectral imagery classification. Although it seems that the most effective order corresponds to first a resampling strategy and then a feature (or extraction) selection algorithm, this is a question that still needs a much more thorough investigation in the futureThis work has partially been supported by the Spanish Ministry of Education and Science under grants CSD2007–00018, AYA2008–05965–0596 and TIN2009–14205, the Fundació Caixa Castelló–Bancaixa under grant P1–1B2009–04, and the Generalitat Valenciana under grant PROMETEO/2010/02

    Double triage to identify poorly annotated genes in maize: The missing link in community curation

    Get PDF
    The sophistication of gene prediction algorithms and the abundance of RNA-based evidence for the maize genome may suggest that manual curation of gene models is no longer necessary. However, quality metrics generated by the MAKER-P gene annotation pipeline identified 17,225 of 130,330 (13%) protein-coding transcripts in the B73 Reference Genome V4 gene set with models of low concordance to available biological evidence. Working with eight graduate students, we used the Apollo annotation editor to curate 86 transcript models flagged by quality metrics and a complimentary method using the Gramene gene tree visualizer. All of the triaged models had significant errors-including missing or extra exons, non-canonical splice sites, and incorrect UTRs. A correct transcript model existed for about 60% of genes (or transcripts) flagged by quality metrics; we attribute this to the convention of elevating the transcript with the longest coding sequence (CDS) to the canonical, or first, position. The remaining 40% of flagged genes resulted in novel annotations and represent a manual curation space of about 10% of the maize genome (~4,000 protein-coding genes). MAKER-P metrics have a specificity of 100%, and a sensitivity of 85%; the gene tree visualizer has a specificity of 100%. Together with the Apollo graphical editor, our double triage provides an infrastructure to support the community curation of eukaryotic genomes by scientists, students, and potentially even citizen scientists. © 2019 This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication

    An insight into imbalanced Big Data classification: outcomes and challenges

    Get PDF
    Big Data applications are emerging during the last years, and researchers from many disciplines are aware of the high advantages related to the knowledge extraction from this type of problem. However, traditional learning approaches cannot be directly applied due to scalability issues. To overcome this issue, the MapReduce framework has arisen as a “de facto” solution. Basically, it carries out a “divide-and-conquer” distributed procedure in a fault-tolerant way to adapt for commodity hardware. Being still a recent discipline, few research has been conducted on imbalanced classification for Big Data. The reasons behind this are mainly the difficulties in adapting standard techniques to the MapReduce programming style. Additionally, inner problems of imbalanced data, namely lack of data and small disjuncts, are accentuated during the data partitioning to fit the MapReduce programming style. This paper is designed under three main pillars. First, to present the first outcomes for imbalanced classification in Big Data problems, introducing the current research state of this area. Second, to analyze the behavior of standard pre-processing techniques in this particular framework. Finally, taking into account the experimental results obtained throughout this work, we will carry out a discussion on the challenges and future directions for the topic.This work has been partially supported by the Spanish Ministry of Science and Technology under Projects TIN2014-57251-P and TIN2015-68454-R, the Andalusian Research Plan P11-TIC-7765, the Foundation BBVA Project 75/2016 BigDaPTOOLS, and the National Science Foundation (NSF) Grant IIS-1447795

    A machine learning approach for feature selection traffic classification using security analysis

    Get PDF
    © 2018, Springer Science+Business Media, LLC, part of Springer Nature. Class imbalance has become a big problem that leads to inaccurate traffic classification. Accurate traffic classification of traffic flows helps us in security monitoring, IP management, intrusion detection, etc. To address the traffic classification problem, in literature, machine learning (ML) approaches are widely used. Therefore, in this paper, we also proposed an ML-based hybrid feature selection algorithm named WMI_AUC that make use of two metrics: weighted mutual information (WMI) metric and area under ROC curve (AUC). These metrics select effective features from a traffic flow. However, in order to select robust features from the selected features, we proposed robust features selection algorithm. The proposed approach increases the accuracy of ML classifiers and helps in detecting malicious traffic. We evaluate our work using 11 well-known ML classifiers on the different network environment traces datasets. Experimental results showed that our algorithms achieve more than 95% flow accuracy results

    Non-lesional lupus skin contributes to inflammatory education of myeloid cells and primes for cutaneous inflammation

    Full text link
    Cutaneous lupus erythematosus (CLE) is a disfiguring and poorly understood condition frequently associated with systemic lupus. Previous studies suggest that nonlesional keratinocytes play a role in disease predisposition, but this has not been investigated in a comprehensive manner or in the context of other cell populations. To investigate CLE immunopathogenesis, normal-appearing skin, lesional skin, and circulating immune cells from lupus patients were analyzed via integrated single-cell RNA sequencing and spatial RNA sequencing. We demonstrate that normal-appearing skin of patients with lupus represents a type I interferon–rich, prelesional environment that skews gene transcription in all major skin cell types and markedly distorts predicted cell-cell communication networks. We also show that lupus-enriched CD16+ dendritic cells undergo robust interferon education in the skin, thereby gaining proinflammatory phenotypes. Together, our data provide a comprehensive characterization of lesional and nonlesional skin in lupus and suggest a role for skin education of CD16+ dendritic cells in CLE pathogenesis.http://deepblue.lib.umich.edu/bitstream/2027.42/192260/2/Nonlesional lupus skin contributes to inflammatory education of myeloid cells and primes for cutaneous inflammation.pdfPublished versio

    Systems-based identification of the Hippo pathway for promoting fibrotic mesenchymal differentiation in systemic sclerosis

    No full text
    Systemic sclerosis (SSc) is a devastating autoimmune disease characterized by excessive production and accumulation of extracellular matrix, leading to fibrosis of skin and other internal organs. However, the main cellular participants in SSc skin fibrosis remain incompletely understood. Here using differentiation trajectories at a single cell level, we demonstrate a dual source of extracellular matrix deposition in SSc skin from both myofibroblasts and endothelial-to-mesenchymal-transitioning cells (EndoMT). We further define a central role of Hippo pathway effectors in differentiation and homeostasis of myofibroblast and EndoMT, respectively, and show that myofibroblasts and EndoMTs function as central communication hubs that drive key pro-fibrotic signaling pathways in SSc. Together, our data help characterize myofibroblast differentiation and EndoMT phenotypes in SSc skin, and hint that modulation of the Hippo pathway may contribute in reversing the pro-fibrotic phenotypes in myofibroblasts and EndoMTs.http://deepblue.lib.umich.edu/bitstream/2027.42/192258/2/Systems-based identification of the Hippo pathway for promoting fibrotic mesenchymal differentiation in systemic sclerosis.pdfAccepted versio
    corecore